{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7b4be307",
   "metadata": {},
   "source": [
    "* 本周主要内容：语音识别\n",
    "* 时间：week10\n",
    "* 记录人：赖文佩"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76be4914",
   "metadata": {},
   "source": [
    "# 本周内容"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccf2c626",
   "metadata": {},
   "source": [
    "## 知识概念"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e58f7a5",
   "metadata": {},
   "source": [
    "> 1.一切都是I/O:软工到产品的一般化知识   \n",
    "> 2.语音识别：speech recognition           \n",
    "> 3.语音唤醒              \n",
    "> 4.自动语音识别到：Automatic Speech Recognition，简称ASR     \n",
    ">> 1.其目标是以电脑自动将人类的语音内容转换为相应的文字，与说话人识别说话人确认不同，后者尝试识别或确认发出语音的说话人而非其中所包含的词汇内容。          \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2d55711",
   "metadata": {},
   "source": [
    "> 5.语音合成：textto speech，简称TTS\n",
    ">> 1.将文字转化为语音的一种技术，类似于人类的嘴巴，通过不同的音色说出想表达的内容。 在语音合成技术中，主要分为 语言分析部分 和 声学系统部分 ，也称为 前端部分 和 后端部分， 语言分析部分主要是根据输入的文字信息进行分析，生成对应的语言学规格书，想好该怎么读；声学系统部分主要是根据语音分析部分提供的语音学规格书，生成对应的音频，实现发声的功能。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "338a3a60",
   "metadata": {},
   "source": [
    "# 语音识别测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "61ef63ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "API_KEY = '0lrXE5VkFzVV8RSl2LNBIgQA'\n",
    "SECRET_KEY = 'l9lWYFjzKE0tqR8jWz1CGgAOXRdUvNpS'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c3ec1549",
   "metadata": {},
   "outputs": [],
   "source": [
    "# coding=utf-8\n",
    "\n",
    "import sys\n",
    "import json\n",
    "import time\n",
    "\n",
    "IS_PY3 = sys.version_info.major == 3\n",
    "\n",
    "if IS_PY3:\n",
    "    from urllib.request import urlopen\n",
    "    from urllib.request import Request\n",
    "    from urllib.error import URLError\n",
    "    from urllib.parse import urlencode\n",
    "\n",
    "    timer = time.perf_counter\n",
    "else:\n",
    "    import urllib2\n",
    "    from urllib2 import urlopen\n",
    "    from urllib2 import Request\n",
    "    from urllib2 import URLError\n",
    "    from urllib import urlencode\n",
    "\n",
    "    if sys.platform == \"win32\":\n",
    "        timer = time.clock\n",
    "    else:\n",
    "        # On most other platforms the best timer is time.time()\n",
    "        timer = time.time\n",
    "\n",
    "\n",
    "# 需要识别的文件\n",
    "AUDIO_FILE = 'audio/16k.wav'  # 只支持 pcm/wav/amr 格式，极速版额外支持m4a 格式\n",
    "# 文件格式\n",
    "FORMAT = AUDIO_FILE[-3:];  # 文件后缀只支持 pcm/wav/amr 格式，极速版额外支持m4a 格式\n",
    "\n",
    "CUID = '123456PYTHON';\n",
    "# 采样率\n",
    "RATE = 16000;  # 固定值\n",
    "\n",
    "# 普通版\n",
    "\n",
    "DEV_PID = 1537;  # 1537 表示识别普通话，使用输入法模型。根据文档填写PID，选择语言及识别模型\n",
    "ASR_URL = 'http://vop.baidu.com/server_api'\n",
    "SCOPE = 'audio_voice_assistant_get'  # 有此scope表示有asr能力，没有请在网页里勾选，非常旧的应用可能没有\n",
    "\n",
    "\n",
    "# 极速版\n",
    "\n",
    "class DemoError(Exception):\n",
    "    pass\n",
    "\n",
    "\n",
    "\"\"\"  TOKEN start \"\"\"\n",
    "\n",
    "TOKEN_URL = 'http://aip.baidubce.com/oauth/2.0/token'\n",
    "\n",
    "\n",
    "def fetch_token(API_KEY,SECRET_KEY):\n",
    "    params = {'grant_type': 'client_credentials',\n",
    "              'client_id': API_KEY,\n",
    "              'client_secret': SECRET_KEY}\n",
    "    post_data = urlencode(params)\n",
    "    if (IS_PY3):\n",
    "        post_data = post_data.encode('utf-8')\n",
    "    req = Request(TOKEN_URL, post_data)\n",
    "    try:\n",
    "        f = urlopen(req)\n",
    "        result_str = f.read()\n",
    "    except URLError as err:\n",
    "        print('token http response http code : ' + str(err.code))\n",
    "        result_str = err.read()\n",
    "    if (IS_PY3):\n",
    "        result_str = result_str.decode()\n",
    "\n",
    "#     print(result_str)\n",
    "    result = json.loads(result_str)\n",
    "#     print(result)\n",
    "    if ('access_token' in result.keys() and 'scope' in result.keys()):\n",
    "        if SCOPE and (not SCOPE in result['scope'].split(' ')):  # SCOPE = False 忽略检查\n",
    "            raise DemoError('scope is not correct')\n",
    "#         print('SUCCESS WITH TOKEN: %s ; EXPIRES IN SECONDS: %s' % (result['access_token'], result['expires_in']))\n",
    "        return result['access_token']\n",
    "    else:\n",
    "        raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response')\n",
    "\n",
    "\n",
    "\"\"\"  TOKEN end \"\"\"\n",
    "\n",
    "def asr(token,AUDIO_FILE):\n",
    "    speech_data = []\n",
    "    with open(AUDIO_FILE, 'rb') as speech_file:\n",
    "        speech_data = speech_file.read()\n",
    "    length = len(speech_data)\n",
    "    if length == 0:\n",
    "        raise DemoError('file %s length read 0 bytes' % AUDIO_FILE)\n",
    "\n",
    "    params = {'cuid': CUID, 'token': token, 'dev_pid': DEV_PID}\n",
    "    #测试自训练平台需要打开以下信息\n",
    "    #params = {'cuid': CUID, 'token': token, 'dev_pid': DEV_PID, 'lm_id' : LM_ID}\n",
    "    params_query = urlencode(params);\n",
    "\n",
    "    headers = {\n",
    "        'Content-Type': 'audio/' + FORMAT + '; rate=' + str(RATE),\n",
    "        'Content-Length': length\n",
    "    }\n",
    "\n",
    "    url = ASR_URL + \"?\" + params_query\n",
    "#     print(\"url is\", url);\n",
    "#     print(\"header is\", headers)\n",
    "    # print post_data\n",
    "    req = Request(ASR_URL + \"?\" + params_query, speech_data, headers)\n",
    "    try:\n",
    "        begin = timer()\n",
    "        f = urlopen(req)\n",
    "        result_str = f.read()\n",
    "        print(\"Request time cost %f\" % (timer() - begin))\n",
    "    except  URLError as err:\n",
    "        print('asr http response http code : ' + str(err.code))\n",
    "        result_str = err.read()\n",
    "\n",
    "    if (IS_PY3):\n",
    "        result_str = str(result_str, 'utf-8')\n",
    "#     print(result_str)\n",
    "    with open(\"result.txt\", \"w\") as of:\n",
    "        of.write(result_str)\n",
    "    return result_str"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fee439db",
   "metadata": {},
   "source": [
    "# 语音识别执行如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "520122c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'24.23c73911c0982402e6b425b7bdb92c7a.2592000.1686663695.282335-33297582'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xu_token = fetch_token(API_KEY,SECRET_KEY)\n",
    "xu_token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "024f7732",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Request time cost 0.835606\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'{\"corpus_no\":\"7233033680130569254\",\"err_msg\":\"success.\",\"err_no\":0,\"result\":[\"北京科技馆。\"],\"sn\":\"144348102351684071886\"}\\n'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "asr(xu_token,'D:/API-AL-ML/api-al-ml/week12/speech-demo-master/rest-api-asr/python/audio/16k.pcm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e9424ec",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
